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Abstract 

We present an automated system for detecting, tracking, and cataloging 
emerging active regions throughout their evolution and decay using SOHO 
Michelson Doppler Interferometer (MDI) magnetograms. The SolarMoni- 
tor Active Region Tracking (SMART) algorithm relies on consecutive image 
differencing to remove both quiet-Sun and transient magnetic features, and 
region-growing techniques to group flux concentrations into classifiable fea- 
tures. We determine magnetic properties such as region size, total flux, flux 
imbalance, flux emergence rate, Schrijver's R- value, R* (a modified version of 
R), and Falconer's measurement of non-potentiality. A persistence algorithm 
is used to associate developed active regions with emerging flux regions in 
previous measurements, and to track regions beyond the limb through mul- 
tiple solar rotations. We find that the total number and area of magnetic 
regions on disk vary with the sunspot cycle. While sunspot numbers are a 
proxy to the solar magnetic field, SMART offers a direct diagnostic of the sur- 
face magnetic field and its variation over timescale of hours to years. SMART 
will form the basis of the active region extraction and tracking algorithm for 
the Heliophysics Integrated Observatory (HELIO). 

Keywords: active regions, feature detection, region growing algorithm, 
space weather 
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1. Introduction 



The automatic identification and characterization of solar features is of 
great importance to both solar activity monitoring and space weather opera- 
tions. This has become a particular issue due to the high spatial and temporal 
resolution solar imagers, such as those flown on the Project for On-Board Au- 
tonomy 2 (PROBA2) and Solar Dynamics Observatory (SDO), which will 
force data providers to distribute subsets of their science products instead of 
the full image data set. Traditionally, solar feature catalogs were created by 
hand, using visual reco gnition to record t he position, size, and other prop- 

An early attempt to overcome 
20021 ). which labels active regions 



erties of features (e.g., Carringtonl. 11854 ) 
this was SolarMonitoirlf lGallagher et al 



(ARs) in solar images using National Oceanic and Atmospheric Administra- 
tion (NOAA) numbers and locations cataloged by the NOAA Space Weather 
Prediction Center. More recently, researchers have begun to catalog fea- 
tures usi ng automated metho ds. The European Grid of Solar Observations^] 
(EGSO; iBentley et al.l . 120021 ). for example, catalogs s olar features using Ha 



and Ca II K images and a n eural network algorithm (IZharkova et al 
Zharkova fc Schetininl . 120051 ). 



2005 



One of the first applications of automated image processing techniques to 
AR identification is th e Automated Region Selection Extraction algorithm 
( IMcAteer et al.l . [2005aJ). This algorithm creates a binary mask of features 
using a static noise threshold applied to a line-of-sight (LOS) magnetogram. 
A sub-image is extracted, centered on the pixel with the highest value. Closed 
contours enclosing an area centered on the seed are grouped as a region. The 
detected region is saved and removed from the magnetogram. The pixel of 
the next highest value is selected and the process repeated. Some saved 
regions are associated with N OAA cataloged r egion s which may be tracked 
across the disk. More recently, lLaBonte et al.l ( 120071 ) extract ARs using full- 
disk magnetograms that are smoothed by roughly one supergranule diameter. 
Region candidates are tested for bipolar flux and east-west orientation. A 
dynamic noise threshold is calculated using the median of average magnetic 
field values for a series of annuli centered on the AR candidate. The AR 
boundary is chosen by comparing the average magnetic field values of smaller 
annuli with the calculated noise threshold. Using annuli to test for region 



1 See: http: / /www.SolarMonitor.org| 
2 See: http://www.egso.org 
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boundaries allows one to isolate ARs from large AR complexes since the 
dynamic noise threshold will be set relative to the surrounding regions. 

An alternative to solely identify ing ARs using the i r mag neti c signatures 

was di s cusse d in a series of papers by lQahwaji fc Colakl (120051 ) and lColak fc Qahwaji 
(I2008L l2009h . In their hybrid extraction algorithm, sunspots detected in 
white-light images are grouped using feature boundaries extracted from mag- 
netograms. Both forms of data are segmented using dynamic thresholding. 
White-light candidates coinciding with magnetogram candidates are grouped 
using growing circles, while a neural network is used to determine which can- 
didates to retain and how to group them. This system has the advantage 
that it compares well the NOAA AR identification scheme, but it does not 
give any insight into AR properties thought to be related to flaring (e.g., 
horizontal B-field gradients, total flux, fractal dimension, etc.) 

A number of algorithms have been developed to measure AR m a gnetic 
characteristics postulated to be related to flari ng: iGallagher et al.l (120021) 
measure gradients in the magnetic field of ARs; iMcAteer et al.l (j2005bf ) es- 
tablis h an AR fractal dimension lower limit of 1.2 for M- and X-class flares to 
Georgoulis fc Rustl (l2007h ca lculate the m agnetic connectivity between 



occur; _ 

fragments of an A R: IConlon et al.l (120081 . l2010h measure the multifractal na- 
ture o f AR flux: iHewett et al.l ( 120081 ) determine the multiscale power-law 
index: Falconer et al.l ( 120081 ) establish a gauge of AR non-potentiality; and 
Zhang et al.l ( 2009 ) determine basic field properties and the degree of AR 
polarity (bipole, quadrupole, etc.). The overall aim of these algorithms is 
to extract a physically-motivated measure of the characteristics of a region, 
subsequently using this information to better understand the fundamental 
physics of ARs, and to relate the properties of AR magnetic fields to their 
flaring potential. This is essential to buildi ng an automated AR monitoring 
and flare forecasting system as discussed in lMcAteer et al.l (120091 ). 

In this paper we present a new algorithm, the SolarMonitor Active Re- 
gion Tracking (SMART) algorithm, which will form the basis of AR iden- 
tification for the Heliophysics Integrated Observatory^ (HELIO). SMART 
combines extraction techniques with AR magnetic property determinations 
(Section H]), region tracking, and cataloging (Section [3]). A cross comparison 
of SMART and NOAA detections as well as a discussion of errors in property 
measurements and feature tracking test cases are presented in Section [U Our 



5 See: http://www.helio-vo.org 
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Summary Flow Chart 




Figure 1: Flow chart summarizing the SMART algorithm processing method, 
conclusions and prospects for future work are then given in Section 



2. Feature Extraction 

The SMART method of operation is summarized in Figure [TJ Initially, 
magnetograms are segmented into individual feature masks (Section 12. ip . A 
characterization algorithm is then run on each extracted region to determine 
feature properties (Section 12. 2p . These region properties are subsequently 
used to classify the form of solar features (Section l2.3p . The final output is a 
set of data structures for each magnetogram, including each feature present. 
The following sub-sections provide details on the operations outlined above. 



2.1. Segmentation 

The segmentation process depicted in Figure [2] begins with two consecu- 
tive Solar and Heliosyheric Observat ory (SOHO )/Michelson Doppler Inter- 
ferometer (MDI; IScherrer et all Il995l ) full-disk, line-of-sight (LOS), level 1.8 
magnetograms. Nominally these are 96 minutes apart, but there are sporadic 
gaps in the MDI data set (only rarely is there an entire day with no data). 
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Figure 2: Flow chart summarizing the magnetogram segmentation method. 



We use two magnetograms recorded close in time to remove transient fea- 
tures and extract time- dependent properties. The magnetogram of interest 
(Figure [3] A) is denoted as B t and the previous magnetogram as B t _^ t . If 
At is greater than one day, the detections are discarded. 

Magnetograms are first checked for problems using properties extracted 
from the Flexible Image Transport System (FITS) data file headers, such 
as the spacecraft roll angle and the number of missing pixel values. Mag- 
netograms are rotated as necessary, so that solar north points up, using 
nearest neighbor sampling interpolation, while those with missing values are 
discarded. A solar energetic particle (SEP) event which occurs during a 
magnetogram exposure results in many bright pixels scattered about the im- 
age. This does not interfere with the AR detection, as the bright pixels are 
smoothed out, but can affect magnetic property determinations. 
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Figure 3: Processing steps for an example feature extraction on 25 November 2003. A) 
Calibrated megnetogram B t clipped to ±1000 G and cropped around NOAA 10507. B) B t 
with gaussian smoothing and noise thresholding. C) Mask (M/. t ) with transient filtering 
and area threshold of 50 pixels. D) Final indexed grown feature mask, IGM t .i- 



We first apply smoothing, a noise threshold, and a LOS correction, re- 
spectively, to the data (Figure [3] B). This set of operations is represented by 
STL Process in Figure El The smoothing operation is necessary t o remove 
ephem eral regions that have size scales on the order of 10 Mm (IHagenar . 



20011 ) . which corresponds to 7 MDI pixels at disk center. To this end, B t -At 
and B t are convolved with a 10 x 10 pixel 2 kernel containing a 2D gaussian 
with a full-width at half-maximum (FWHM) of 5 pixels. 

We use a static threshold to remove the background. Figure @] shows 
the variation in the monthly averages of maximum values of quiet-Sun (QS) 
magnetic field recorded throughout cycle 23. The maximum value varies by 
roughly 5 G over the cycle which is less than the monthly standard devia- 
tion of these maxima, so a static threshold is acceptable. The mean of the 
maximum unsigned QS magnetic field values is ~70 G. Figure [5] shows a 
smoothed AR and nearby QS region contoured at ±70 G. The histogram 
shows the distributions of magnetic field values for the AR and QS regions, 
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Years since l-Jan-1997 



Figure 4: The maximum of quiet-Sun magnetic field values over solar cycle 23. Each point 
is the mean maximum value for a month of magnetograms (nominally two per day, but 
less for particularly active periods). The error bars are the standard deviations of each 
month's set of values. The continuous gray line is the smoothed, monthly sunspot number 
from Solar Influences Data Analysis Center (SIDC; [http:/ /sidc.oma.be| . 



including the difference between the two distributions. Thresholding at the 
±70 G level removes small features which have been smoothed out by the 
gaussian convolution but maintains extended strong-field features, such as 
bipolar and plage regions. Pixels in B t _^ t and B t with absolute values less 
than 70 G are zeroed. 

In the case where magnetic fields are primarily vertical to the solar sur- 
face, the LOS component of the field is reduced toward the limb. As such, 
a feature with the same magnetic field strength and orientation with respect 
to the solar surface will appear lower in magnitude when located toward 
the solar limb than at disk center. This LOS effect is corrected at each 



MDI pixel using a cosine correction factor (IMcAteer et all l2005al ). After 



this stage, B t _&t data is differentially rotated to time t to correct for feature 
motions due to sola r rotation using the latitudinal dependence derived in 



Howard et all (11990( 1 . 



The corrected magnetograms are made binary by setting all pixels with 
magnetic field values above the ±70 G threshold equal to one, yielding masks 
M t -At and M t . Features consisting of less than 50 pixels and those which are 
not present in both masks are removed by the following operations (Figure [3] 
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Figure 5: A comparison of magnetic field value distributions for a quiet Sun and solar 
feature region. Top left: Magnetogram of NOAA 8086, gaussian smoothed using a FWHM 
of 5 pixels. Top right: A nearby region of quiet Sun in the same full-disk image. Bottom: 
The feature and quiet-Sun unsigned magnetic field distributions. The thick red line is the 
difference between the quiet-Sun and AR distributions and the vertical dash-dotted line 
denotes 70 G. 

C). Firstly, each mask is dilated by 10 pixels to allow for region expansion. 
Secondly, the binary masks are subtracted such that non-zero pixels in the 
difference mask identify features only occurring in M t „At or M t . These tran- 
sient features are subsequently removed from the un-grown version of M t , 
which is then dilated by 10 pixels to form Mf$ (Figure [3] D). Individual con- 
tiguous features in Mfj are indexed by assigning ascending integer values 
(beginning with one) in order of decreasing feature size. The segmentation 
output is an indexed grown mask (IGM t ), as shown by the thick red box in 
Figure El 
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Data Type 


Identifier 


Explanation 


Feature Array 




extracted feature magnetogram 




IGM tt i 


extracted feature mask 




HG t 


heliographic position map 




■™-cos,t,i 


S^fe >< (1.4 MmVpixel)" 1 




®t,i 


Bt x A cos j : i 




d<S> 1 


(\Bt\-\B t ^ At \)xA COStt ,i 




dt IM 


At 


Property Value 


B^GrpQg ± a 




l^pix {HG t XlGM t>i ) 




B m ax,t,i 


maximum value of B t ^ 




B m in,t,i 


minimum value of B t .i 




Btot,t,i 
Bfot uns,t,i 


y ■ B ti 




AA,cr 2 ,7, K 


mean, variance, skewness, kurtosis 




Atot,t,i 


y>pix A C os,t,i 




®+,t,i 


E pix > o) 






E PIX (**,« < o) 




^uns,t,i 

^imb,t,i 
d<5> 1 

fU \net,t,i 


V I $ + ■ I 

L^ipix I M I 

|(*+,M-|*-,«, 1 [)| 




V- d$ I 
Z—ipix dt I*'* 



Table 1: Feature magnetic properties derived from characterization processing. 



2.2. Characterization 

The aim of SMART is to characterize ARs in a manner which does not 
make theoretical assumptions or require many observations of the same fea- 
ture. Our design is adaptable, so that the software may produce initial re- 
sults in near-realtime for operational purposes, but allows the retrospective 
addition of complex property measurements (e.g., magnetic helicity). These 
requirements define criteria for the selection of initial property calculations. 
There are many AR properties that may be derived from magnetograms. A 
subset of these are derived from 96 minute LOS data and those output by 
SMART are included in Tables 1 and 2. 

The SMART characterization process utilizes the feature mask retrieved 
by the methods outlined in the previous section, following the procedure 
detailed in Figure O The property measurements are derived from the mag- 
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Characterization Flow Chart 



MDI 




MDI 




IGM 


B 




B 






t-At 




f 








T L 'V Noise Threshold 
L Process^ Line of Sight 
Correct 



/ denotes indexing of IGM. 
Non-region pixels are zeroed. 




0+,t,i, 0-,t,i, 0uns,t,i, 

<t>imb,t,i, d<P/dt t,i, 
Statistical Moments 



Figure 6: Flow chart summarizing the feature magnetic property characterization method. 

netogram taken at time, t which is processed in the manner detailed below. 
We subscript the mask containing all features by i to extract a single feature 
mask, lGM t i . A cosine-weighted area map, A cos ^ t)i is derived which corrects 
pixel areas to solar surface area rather than plane-of-sky area, and is summed 
to yield total feature area, A to t,t,i- 

Full-disk magnetograms B t _& t and B t are processed as in Section 12.11 
(thresholding, LOS correction, B t _&t differentially rotated to time t), but 
without smoothing. Single features are extracted for magnetic property de- 
termination using the indexed grown mask, 



B, 



t.i 



IGM,,, x B, 



(1) 
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Data Type 


Identifier 


Explanation 


Feature Array 


M PSLU 


polarity separation line mask 




MpsL,thin,t,i 


thinned polarity separation line mask 


Property Value 


LpSL,t,i 


Tlipix MpSL,thin,t,i 




L S g,t,i 


Lp SL ,t,i > 50 G Mm" 1 




Rt,i 


-R-value 1 




K t,i 


Yjpix ( M PSL,t,i * G(XUSS2d) x B tji 




WL sg,t,i 


non-potentiality gauge 2 




WL* t ■ 

sg,t,i 


E P ^p SW xVB t , t 



Schriiverl (l2007h 



Falconer et all (|2008l ) 



Table 2: Feature magnetic properties derived from polarity separation line characteriza- 
tion. 



yielding an array where all pixels but those in the feature are set to zero. The 
processed magnetograms are subtracted and divided by their time separation 
to yield a map of the temporal change in field strength, dB/dt\ t , leading up 
to time t. This is combined with A C0St t,i to determine the flux emergence 
rate, d<&/dt\tj, of feature i. 

The extracted B t i is used to extract other properties from feature i (as 
detailed in Table 1) such as statistical moments of the magnetic field and the 
minimum and maximum magnetic field values {B min ^ and B max ^ ti ). B t i is 
multiplied by A COStt ,i to derive the total positive, negative, and unsigned flux 
($+,t,i, and $ unSj t,i), the relative flux imbalance ($imb,t,i), an d the net 

flux emergence rate (d& / dt\ net;tii ) . 

The extracted feature magnetogram, B t i , is also used to derive proper- 
ties based on the polarity separation line (PSL). Figure [7] summarizes the 
extraction of feature properties related to PSLs and Table 2 lists the proper- 
ties derived. Initially, the feature is segmented into its positive and negative 
components. These components are used to create a positive and negative 
mask, each of which is dilated by 4 pixels. The two masks are summed and 
the region of mask overlap becomes the PSL binary mask, MpsL,t,i- The al- 
gorithm then thins M PSL ^ t)i to one pixel (MpsL,mn,t,i) an d sums the non-zero 
pixels to determine the PSL length (L PS L,t,i)- L sg ^ t)i is obtained by summing 
only those pixels which have VB t ^ > 50 G Mm -1 , where VB t i is calculated 
by numerical differentiation using 3-point Lagrangian interpolation. We also 
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Polarity Separation Line Characterization Flow Chart 




Values below 
^ threshold are set 

^Binary J to those above 
are set to 1 . 



.Binary . 




WLsg.t.i 




Mpsl 

Iji 















L 

SG 



Figure 7: Flow chart summarizing the quantities derived from feature polarity separation 
lines. 



calculate the .R-value (Rt,i) as pr esented in ISchrijverl (l2007f) and the WL 
gauge (WL agi t,i) as presented in Falconer et al. 



■V) 



2008|), both of which use 



specific gradient and magnetic field thresholding when extracting the PSL. 

Finally, using MpgL,t,i we calculate which is a more sensitive version of 
the R- value, since it contains no gradient thresholding and t he magne t ic fiel d 
threshold of ±70 G is much lower than the ±150 G used in lSchrijverl (120071 ). 
The algorithm convolves MpsL,t,i with a 20 x 20 pixel 2 kernel containing a 2D 
gaussian with a FWHM of 10 pixels, which is multiplied by B tj i and summed 
Similarly, an a l ternat ive of WL sg j ti , WL* ti is calculated 



to achieve R$ A 



by applying the Falconer et al.l (120081 ) method, but using a magnetic field 



threshold of ±70 G and no gradient threshold. 
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Classification Flow Chart 



Feature 
Candi- 
date 




Figure 8: Flow chart summarizing the feature classification method. 



A set of data structures is created for each magnetogram including the 
above mentioned properties of each extracted feature (used for classification; 
Section |2U]) and the feature's heliographic location and time of measurement 
(used for feature tracking; Section [3]). 

2.3. Classification 

At this stage the SMART algorithm has characterized the properties of 
each automatically extracted feature. The classification process uses these 
properties to discriminate between various feature types, which are saved in 
the algorithm output. Extracted features are initially grouped (as shown in 
Figure [8]) into two catagories: features with a flux imbalance greater than 
90% are classified unipolar (U), while those having less than 90% are classi- 
fied multipolar (M). After polarity balance, the total unsigned magnetic flux 
($ uns ,t,i) is tested. Features with $ una ,t,i greater than 10 21 Mx are classified as 
large (L), while features with <f> una ,t,i less than 10 21 Mx are classified as small 
(S). Finally, the sign of ^\t,i is tested to determine if features are increasing 
in flux (emerging, E) or decreasing in flux (decaying, D). The classification 
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scheme results in eight possible feature classifications which are then also 
attributed to common magnetic feature designations: MLE and MLD are 
denoted evolving ARs; MSE and USE are denoted emerging flux concentra- 
tion (EF); MSD, USD, and are denoted decaying flux concentrations (DF), 
and finally, ULE and ULD are denoted plage (PL). These common designa- 
tions are also saved in the algorithm output, allowing one to make a quick 
assessment of which regions on disk are interesting from a monitoring point 
of view. For example, EFs may become ARs and evolving ARs may produce 
activity during their evolution, while PL and DF are not likely to produce 
activity. 



3. Tracking 

Having detected various solar features, the SMART algorithm associates 
features across different time intervals. Spatial and temporal information is 
used to track features between consecutive images (Section 13. ip and around 
the far-side of the disk between consecutive solar rotations (Sections 13. 2p . 
Features are then cataloged using the time of their first detection and their 
classification (Section l3.3p . 

3.1. Consecutive Images 

The set of features in a magnetogram is compared with the previous five 
magnetogram sets to associate previously catalogued features with the cur- 
rent set. Feature positions (HG P (2S jj} are diffe r ential ly rotated, using the 



latitudinal dependence derived in iHoward et al.l (ll990f ). to the same time t 



and features matched when their heliographic separations are less than 5 de- 
grees. Features having one classification in previous sets may be associated 
with features having a different one in the current set. Thus, the SMART 
algorithm is capable of tracking possible ARs (MLE, MLD) back to their 
first emergence as an EF (MSE). Decaying features may also be associated 
with features previously denoted as possible ARs, allowing ARs to be fol- 
lowed through their final stages of evolution. Fragmentation often occurs in 
these late stages which SMART allows for since it does not preclude multiple 
features from being associated with a single previous feature. If one feature 
splits into two, each resulting fragment will be associated with the original 
feature if the resulting fragment positions are within the matching threshold 
of the original. A letter is appended to the catalog name of each additional 
associated feature so that individual fragments may be differentiated. 
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3.2. Far-side Passage 

Features are tracked beyond the limb through multiple solar rotations to 
study their evolution from emergence to decay. We calculate the rotation 
period, P ro t,i, for each feature at time t, which depends on its heliographic 
latitude due to differential solar rotation. The feature position is compared 
to those in the five magnetogram sets centered on time (t + tj ) — P ro t,i using 
the method in the previous section. In this approach the feature position 
is essentially rotated to a longitude of +70 degrees then back one full solar 
rotation, where £70 is the time taken for the feature to rotate to 70 degrees 
heliographic longitude from its position at t. In this way the feature is 
constrained to have been previously detected just before west limb passage, 
which increases the efficiency of the algorithm. 

3.3. Cataloging 

There are two identifications recorded for each detected feature in a mag- 
netogram at time, t. The first, % is obtained from IGM t and denotes the 
two-digit size order of the feature. A feature within a single magnetogram 
is uniquely identified by i. The second identification is the static catalog 
name, YYYYMMDD.MG.NN, where YYYY is the four digit year, MM is 
the two digit month, and DD is the two digit day. The next two characters 
specify the feature type: MG denotes a photospheric magnetic feature. This 
scheme can be expanded to incorporate coronal holes (CH), filaments (FI), 
and transient features such as flares (FL) and coronal mass ejections (CE) in 
EUV images. Finally, NN is % when the feature is given a static catalog name. 
This catalog name is determined once for each feature upon first detection, 
and is used for all measurements of the same feature as it is tracked through 
time. 

4. Results and Discussion 

Figure [9] summarizes a comparison of NOAA and SMART AR detections 
over the cycle 23, including numbers of detections and total feature area on 
disk. The top panel shows the total number of regions detected in each data 
set, arranged in monthly bins; the correlation coefficient between the (un- 
binned) daily data is 0.88. We estimate the frequency of divergence between 
the detections using the ratio of NOAA to SMART AR daily detections: 
the ratio is between zero and one 6%, equal to one 22%, between one and 
two 60%, and greater than two 12% of the time. We see a smaller number 
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Figure 9: A comparison of NOAA and SMART AR detections (binned by 1 month) over 
cycle 23. The data gap in 1998 is due a the loss of communications with the SOHO 
spacecraft for several months. 

of SMART than NOAA AR detections 72% of the time; the mean ratio of 
NOAA to SMART AR detections is 1.5. This is likely due to the joining of 
two or more nearby sunspot groups by SMART, while NOAA identifies each 
individual sunspot group, regardless of proximit}0. As such, SMART detec- 
tions are representative of isolated magnetic systems, while NOAA detections 
represent a feature recognition approach. Additionally, NOAA records de- 
tections by eye, and only if they are visible in intensity data (i.e., if there is 
a magnetic flux concentration with no sunspot SMART may detect a region 
when NOAA does not). The bottom panel shows the total area of NOAA 
regions scaled to the total area of SMART regions. In fact, the NOAA area 



4 NOAA may also detect very weak sunspots which may have a & U ns,t,i too small for 
designation as an AR by SMART. 
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is lower by a factor of ~50, since only the low-intensity area of sunspots is 
summed, while the area of extended magnetic features is recorded in SMART 
detections. Number and area are the only two feature properties which can 
be directly compared, as NOAA data do not contain any magnetic property 
measurements. 

The determination of the magnetic properties of a feature is affected 
by MDI magnetogram noise levels, calibration, strong field saturation, and 
LOS effects. The feature detection itself is generally not affected by these 
pheno mena, however. The i nstrument noise threshold of MDI is nominally 
±20 G (IScherrer et al.l . ll995l ). This is smoothed by the gaussian convolution, 
and the segmentation threshold of ±70 G is well above this. For magnetic 
property calculations, a gaussian convolution is not used, so noise contributes 
20 G to the uncertainty of pixel values above the QS threshold of 70 G. For 
SMART region 20031026.MG.il observed at disk center on 25 November 
2003, which is found to have a A to t,t,i of 3.8 x 10 4 Mm 2 and a <& uns ,t,i of 
5.9 x 10 22 Mx, the uncertainty is 7.9 x 10 21 Mx, or 5%. 

Some calibr a tion i ssues with the MDI data used by SMART are discussed 
in IWang et al.l (120091 ) . It was found that the 2008 calibration of level 1.8 



data has been partially corrected, in that it does not suffer from a disk 
center-to-limb variation like the 2007 calibration. However, MDI may largely 
underestimate the magnetic field as the ratio of MDI values to those retrieved 
from Hinode / 'Solar Optical Telescope data was found to be ~0.7. This does 
not affect feature detections since the effect is consistent throughout the data 
set, but could contribute a considerable error of ~30% for any magnetic field 
or flux measurements. 



Strong magnetic field saturation in MDI data is discussed in iLiu et al 



( 120071 ) . It is estimated that this phenomenon occurs in ~5% of ARs, in 
which the magnetic field measurements in the umbral areas of very strong 
sunspots behave non-linearly. In extreme cases, the umbra may appear to 
have a smaller magnetic field than the surrounding penumbra. In reality, the 
field should continue to increase in the umbra, but in level 1.8 data showing 
NOAA 9002 at disk center, saturation is clearly observed at ~3000 G. Feature 
boundaries are not affected because saturation only occurs for very strong 
sunspot umbrae, although the derived magnetic properties of features which 
include strong sunspots will be underestimated. 

LOS effects occur when features are not observed at disk center. To 
estimate the effects of this we model a circular spot with an area of 1.6 x 
10 4 Mm 2 progressing to the edge of the solar disk. The LOS area is measured 
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Days since 26-Oct-2003 



Figure 10: Tracking of 20031026.MG.il as it rotates around the Sun from 26 October to 
26 December 2003. 

at longitude increments of 3 degrees and modified by the SMART cosine area 
correction. This is compared to the disk center area of the spot, resulting in 
an over correction of ~3% when the centroid reaches 60 degrees longitude. 
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Figure 11: Feature detection and tracking cases which diverge from NO A A. A) Two bipolar 
regions join and subsequently fragment. B) Several small bipolar regions merge into an 
AR complex. C) A bipolar region is first detected as two unipolar features and then as a 
single bipolar region. 



This error depends on morphology and will be more acute for complex feature 
boundaries. The over correction increases quickly to ~40% as the feature is 
tracked toward the limb. 

An example of the SMART method of feature tracking and cataloging 
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is shown in Figure ITU1 Region 20031026.MG.il is tracked from 26 October 
2003 to 26 December 2003. The AR rotates beyond the west limb and is 
detected again upon returning at the east limb twice. Although the AR is 
tracked to subsequent solar rotations its catalog name remains the same when 
it returns. NOAA first detects this AR on 28 October 2003 designating it as 
NOAA 10488. When the region returns it is designated a new region num- 
ber, NOAA 10507 and is renamed upon the second return as NOAA 10525. 
SMART'S persistent naming through multiple rotations allows independent 
measurements of the same feature to be grouped into a single time plot. 

The top panels in Figure [10] show MDI magnetograms of the region 
(clipped at ±1000 G) on three different dates, with the extracted AR outlined 
by a thick white contour (other detections are outlined in blue). A connect- 
ing red line shows where each falls on the timeline below. The remaining 
panels show, from top to bottom, time series of total unsigned flux (& U ns,t,i), 
heliographic longitude (HG poSj t,i), PSL length (LpsL,t,i), an d R value (Rt,i) 
extracted from 20031026.MG.il. Vertical dotted green (blue) lines denote 
crossings at ±60 degrees of the leading (trailing) edge of the feature; in the 
second time plot, the green (blue) curve tracks this leading (trailing) edge in 
time. In the plot of PSL length, the black curve sums the length of all de- 
tected PSL segments (Lpsi,t,i), while the light-blue curve sums those having 
a gradient above 50 G Mm -1 (L sg ^ t ^). Finally, the plot of R- value shows 
in black and R t ^ in blue. 

The stability of the algorithm is estimated using the plot of § uns ,t,i be- 
tween days 25.6 (20 November 14:24 UT) and 33.7 (28 November 16:48 UT). 
A quadratic fit is subtracted to remove the long timescale variation, resulting 
in an array of residuals. The two-sigma error of the residuals is determined 
to be 2.1 x 10 21 Mx or 3% around the mean of Q unS! t,i- The stability estimate 
is particular to this example, as cases such as those shown in Figure [TT] could 
result in much larger short timescale variation. 

There are several recurrent SMART feature tracking cases which diverge 
from what would be expected of NOAA (Figure [Til . The SMART tracking 
algorithm allows features to converge and split apart. However, there may be 
side-effects, such as when a fragment separates from a larger feature and is 
given a new catalog name, due to the centroids of the two being greater than 
the tracking association threshold (top row). Also, an active region complex 
may be detected when there are multiple strong field ARs in close proximity 
(middle row). Finally, a bipolar region which is significantly disjointed and 
weak may not be properly grouped into a single region (bottom row). Here 
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we see an example where each polarity is detected as a separate region. As 
this work is designed to aid in flare forecasting, many examples of each of 
these cases may be studied to determine if they possess unexpected flaring 
properties. Also, their evolution maybe studied by tracking the features 
from first emergence. The frequency of occurrence for these special cases 
can be estimated using the data and analysis of Figure EE when N NO aa is 
greater than N S mart SMART is likely grouping regions into AR complexes 
(or identifying NOAA ARs as EF or DF), and when N smart is greater than 
Nnoaa SMART may be detecting individual unipolar features when NOAA 
groups them into bipolar regions. 



5. Conclusions 

The SMART algorithm allows one to monitor ARs on the solar disk 
in near-realtime and perform extensive studies on AR magnetic properties. 
SMART is unique among automated AR extraction algorithms in that it 
allows the temporal analysis of magnetic properties from birth and through 
multiple solar rotations. Future work will include the analysis of trends in 
AR evolution over the solar cycle. This is a largely untouched subject that 
begs important questions, such as whether ARs are born destined to flare or 
randomly evolve to become flare-active. This may also provide new insights 
into the behavior of the solar dynamo. 

Previous algorithms include some of the functions performed by the SMART 
algorithm, such as feature and magnetic parameter extraction. However, 
new utilities are incorporated into the SMART code, such as day-to-day and 
multiple rotation feature tracking. Extensive AR properties such cis ctrcBj 
(Atot,t,i) and total magnetic flux (& U ns,t,i) are determined, as are intensive 
properties such as the maximum magnetic field ( B mnx ti j) and st a tistica l mo- 



ments (//, <j , 7, k). Some algorithms, incl uding lLaBonte et al.l (120071) only 



detect the largest regions, while others like IColak fc Qahwajil (j2009f ) only de- 
tect ARs with sunspots in white-light images. All current algorithms track 
ARs using visually identified NOAA specifications. The SMART algorithm 
is independent from these specifications and needs no human intervention 
to detect and track ARs. Additionally, it utilizes an improved feature cata- 
loging system which incorporates the date of first detection and the feature 
type. 

The SMART algorithm will be used to create a comprehensive catalog 
of features present in magnetograms covering the entirety of solar cycle 23 
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and will be adapted to use ST>(3/Helioseismic and Magnetic Imager data. A 
pipeline version of the algorithm will output detections for inclusion in the 
Heliophysics Event Knowledgebase^. Additionally, it will form part of HE- 
LIO. In this application, ARs tracked using SMART will be associated with 
a chain of features and events propagating throughout the heliosphere, such 
as EUV loops, flares, CMEs, magnetic disturbances and storms detectable 
in Earth's aurorae and ground-based magnetometer data, as well as distant 
particle instruments such those on the Voyager and Mercury Surface, Space 
Environment, Geochemistry, and Ranging (MESSENGER) spacecraft. 

The magnetic properties of ARs retrieved by the SMART algorithm will 
also be used for flare forecasting. While the magnetic complexity of ARs is 



known to be an important predictor of flare activity (ISammis et al.l . 12000 



Schriiverl. 120071: iMcAteer et all l2005bl ; IConlon et ail |2008|), recent work by 



Welsch et al.l (120091 ) shows that extensive magnetic properties outperform 
intensive properties as predictors of AR flare activity. One of S olarMonitor's 



current flare-fore c asting algorithms assumes P oisson statistics (IMoon et al 



200 ll ; IWheatlandl . l200ll ; iGallagher et al.l . |2002| ) and relies on histor ical flaring 
rates from 1988 to 1996 for each Mcintosh sunspot classification ( iMcIntoshl . 
1990). This will be superseded by a statistical forecasting algorithm that 
makes use of extensive AR magnetic properties determined by SMART. 

Any forecasting algorithm which makes use of magnetic properties output 
by SMART will need to take into account several sources of error. Random 
errors including magnetogram noise and algorithm stability for the example 
presented in Section @] result in an error of ±5% and ±3% in $ msAi , respec- 
tively. This will not affect the forecasting potential of properties involving 
&uns,t,i for a sufficiently large sample of regions. Calibration errors in MDI 
result in an underestimate of the true magnetic field on average by ~30%. If 
the forecasting training set and test samples both exhibit this error, the pre- 
diction result will not be affected. However, for physical studies of energetics 
this must be taken into account. Finally, LOS effects which occur as regions 
approach the limb cause large measurement errors past 60 heliographic de- 
grees from disk center, which limits the potential forecasting range of this 
algorithm. 



See: http://www.lmsal.com/helio-informatics/hpkb/index.html 
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